A minimal stochastic model for influenza evolution 
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We introduce and discuss a minimal individual-based model for influenza dynamics. The model 
takes into account the effects of specific immunization against viral strains, but also infectivity 
randomness and the presence of a short-lived strain transcending immunity recently suggested in 
the literature. We show by simulations that the resulting model exhibits substitution of viral strains 
along the years, but that their divergence remains bounded. We also show that dropping any of these 
features results in a drastically different behavior, leading either to the extinction of the disease, to 
the proliferation of the viral strains, or to their divergence. 
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I. INTRODUCTION 

Influenza Q exhibits two apparently contradictory fea- 
tures: on the one hand, any given individual can get in- 
fected with the disease over and over again, since the 
virus mutates fast enough to escape acquired immunity; 
on the other hand, on any given epidemic season, the 
viral strain is sufficiently well-defined, so that an effec- 
tive vaccine can be identified. The fact that circulating 
viral strains arc closely related is also exhibited by the 
shape of its phylogenetic trees ||. Influenza A, the 
most relevant epidemiologically, can be distinguished in 
several subtypes, according to the nature of their capside 
proteins hemagglutinin (H) and neuroaminidase (N). The 
currently prevailing strains belong to the H3N2 subtype. 
The phylogenetic relationships of different strains within 
a subtype are usually reconstructed from the hemagglu- 
tinin (HA) sequences, since this protein appears to be 
highly prone to substitutions. The resulting tree has a 
characteristic comb-like shape, with a well-defined back- 
bone and several short-lived side branches. This has been 
contrasted with the trees of other viruses, like HIV or the 
measles virus, which show more ramified patterns [Q. 

This problem has been recently addressed by Ferguson 
et al. |^] , who identified a short-term strain-transcending 
immunity as the essential factor to avoid the dichotomy 
between extinction and strain proliferation, and obtained 
phylogenetic trees quite similar to the one observed. 
However, the model proposed in ref. Q contains a very 
high number of parameters, including a nontrivial source 
of heterogeneity among individuals in the geographic pat- 
tern of transmission. 

We build up in this paper what we believe is a mini- 
mal model explaining this feature of influenza epidemiol- 



ogy within a subtype. The model takes into account the 
genetic drift of the virus and the effects of specific im- 
munization. It also features a short-term cross-immunity 
effective against all viral strains (short immunity), in- 
troduced in ref. pH, and the competition between viral 
strains, not previously considered, due to their different 
infectivities. The possible relevance of such an effect is 
supported by a recent study Q which shows that single- 
nucleotide substitutions can lead to large fitness changes 
in RNA viruses. The removal of any of these features 
from the model would impair its viability. 

In section [0| we provide a brief review of recent models 
describing the dynamics of influenza. In section [II we de- 
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scribe the model we propose. An analysis of its behavior 
and the simulation results are expounded in section IV. 
We close by a discussion of our results and an outlook on 
further research. 



II. MODELS OF INFLUENZA DYNAMICS 

Recent models of influenza dynamics combine the clas- 
sical ideas of mathematical epidemiology with aspects of 
evolutionary genetics. In a class of models, represented, 
e.g., by 0, H ||, [hJ, one assumes the existence of a single 
preferred strain at each season, at a given genetic dis- 
tance from the previously preferred one. These models 
show how the virus population can drift fast enough to 
remain close to the preferred strain: however, they do not 
address the crucial question mentioned above, namely the 
quasi-one dimensional structure of phylogenetic tree. 

In a second class of models JO], [l2| |l3| the genetic drift 
of the virus is assumed to take place in a low-dimensional 
space (d = 1,2) with only one viable mutation resolving 
the compromise of maintaining the effectiveness of the 
HA protein and escaping the detection by the immune 
system. These models can be studied quite deeply with 
combined analytical and numerical methods. They ex- 
hibit a regime with travelling waves, describing a persist- 
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ing genetic drift of the virus with a bounded diversity. 
Although from a pragmatic point of view these mod- 
els provide a reasonable representation of the observa- 
tions, directionality in evolution is assumed rather than 
derived. This has different problems: Simple stochastic 
variations of the model do not lead to the desired be- 
havior. A stochastic process that has on average only 
one escape direction at a given time would face extinc- 
tion after a finite number of steps. Moreover, a detailed 
analysis of HA sequences in ref. jl4|, [l5| identifies a non 
trivial structure of clusters that succeed one another in 
time with abrupt jumps in protein Hamming distance 
from one cluster to another. This gives circumstantial 
evidence that a larger than one-dimensional manifold in 
genomic space is involved in the process. 

The most structured attempt to derive a working 
model of interaction between viral strain evolution and 
epidemiological dynamics is represented by ref. [9. In 
this model a complex spatial structure of the host pop- 
ulation as well as a detailed parametrization of the dy- 
namics of infection and recovery is introduced, and the 
viral evolution takes place in a high-dimensional genomic 
space. The main result of the authors is that in order to 
avoid the proliferation of strains it is necessary to assume 
that infection to a given virus, in addition to conferring 
long term specific immunity to close strains, it also elicits 
a short immunity against all possible variants. Unfortu- 
nately the models contains a high number of parameters 
and it is difficult to isolate this mechanism from the dif- 
ferent details of the model. 



III. BUILDING UP THE MODEL 

We use an individual-based model generalizing the bit- 
string model introduced in ref. . We consider a pop- 
ulation of N individuals, which can be host to the virus. 
The antigenic features of the virus are summarized in a 
binary string a of length L. Each host can be in one of 
the following states: 

Healthy: The host can be infected by suitable strains of 
the virus, depending on its acquired immunity; 

Infected: The host is infected by a unique viral strain 

a-; 

Recovered: The host is not infected and cannot be in- 
fected by any viral strain. This state represents 
those individuals which are protected by the short 
immunity against infection by any influenza strain. 

The immunity acquired by any host i in its lifetime is de- 
scribed by the temporally ordered set E^ of strains which 
have infected it in the past. A viral strain a cannot in- 
fect a host i if the set E^ contains one or more strains 
a' such that the Hamming distance dn(o~, a ) < r, where 
r is the immunity range. Individuals are removed with 
rate A -1 , independently of their state (where A is the av- 
erage lifetime), and replaced by healthy individuals with 



a virgin immune state (E = 0). Similarly, the duration of 
the illness and that of the recovered state are exponen- 
tially distributed with averages r and 77 respectively. The 
memory set E^ has a maximal length £q. If an individual 
has been infected by more than £q strains, only the most 
recent £q ones are remembered. Since at any given time 
"too old" variants are completely extinct and out of the 
immunity range relative to the active ones, the dynamics 
of the model is independent of £q if this parameter is large 
enough. We have used £$ = 50 in the simulations, but in- 
dependence is already reached for £q ~ 20. The recovered 
state represents the short immunity introduced by Fergu- 
son et al. H . The disease is transmitted through random 
encounters between infected and healthy individuals, as- 
suming homogeneous mixing, in the dynamical process 
that we now describe. 

At each time step (representing a duration dt/N) an 
individual i is picked up at random. If the individual 
is infected, one first checks for possible mutations of the 
virus: with probability fidt the state of one of the bits of 
its strain ui is changed. Then one picks up at random v 
different individuals in the population, where v is Pois- 
son distributed with average f3(o-i)dt/r, where f3(o~i) is 
the infectivity of the viral strain o~i . If one of these indi- 
viduals is healthy, and its immune memory does not elicit 
immunity, it becomes infected with strain Cj. Then, with 
probability di/r, the individual i goes to the recovered 
state, and the strain Ui is added to its immune mem- 
ory Ej. If the individual i is recovered, then it moves 
to the healthy state with probability di/r/. The infec- 
tivities (3(a) are assumed to be independent, identically 
distributed random variables for each strain <r, with a 
gamma distribution of average (3q and parameter k: 

p((3;k,(3 )oc(^j k 'e-W. (1) 

The gamma distribution has the advantage of being easy 
to implement and of being naturally defined only for non- 
negative values of (3. 

The values of the parameters are chosen to be close to 
realistic estimates. One of the problems we meet is to 
consider populations large enough to avoid extinctions 
due to stochastic fluctuations, and to reasonably imple- 
ment the immune memory. We could simulate in reason- 
able time systems of size up to N = 500000. We choose 
the year as unit of time and set the average duration r 
of the illness to 0.02 (i.e., roughly one week). We set 
the average duration -q of the recovered state as 0.75 (i.e, 
after 6 months one has ~ 50% probability of being no 
more immune). 

According to ref. |]l6|] , the spontaneous genomic muta- 
tion rate /j g for influenza A viruses equals roughly one 
mutation per genome per replication. Taking into ac- 
count the duration of a viral generation (a few hours) 
and the fact that we are looking at a small portion of 
the genome (L = 32 in the simulations that follow) we 
can set fi ~ 1 per strain per year as a good order of 
magnitude. 
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We implemented two versions of this model, that differ 
by the duration of the elementary time step. This dif- 
ference affects the details of the dynamics, but not the 
overall behavior of the model. In the slow version, the 
elementary time step dt was taken as small as 0.001, i.e., 
about 8 hours. In the fast version, we took dt — r (the 
duration of the illness), i.e., at the end of the infection 
process, the infected individual was systematically moved 
to the recovered state. The fast version was used to ex- 
plore the phase space of the model, and the behavior of 
the system in the interesting regime was then analyzed 
in details with the slow version. 



IV. RESULTS 

We first look at the simple version of the model, in 
which the duration of the recovered state is negligible, 
and the infectivity (3 is not random. This model coincides 
with the bitstring model introduced by Girvan et al. jy] , 
and exhibits only two possible behaviors: extinction of 
the viral disease, or proliferation of the viral strains. We 
show in fig. [j] the results of the simulation of the model 
with a population of 500000 individuals. 

All results quoted in this explorative stage were ob- 
tained with the fast version of the dynamics. 

An important parameter characterizing the viral pop- 
ulation at a given time is the effective number of circu- 
lating strains n, defined as the inverse participation ratio 
of the numbers v a of individuals infected by strain a, 
namely 
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Since due to mutations one can have many different 
strains, each infecting only a small number of individuals, 
the effective number of strains can be much smaller than 
the total one. We also monitored some quantities which 
could give us some insight into the way the viral strains 
explore sequence space. The system is initialized with a 
single viral strain a = <tq = (0,0, .. . ,0), and mutations 
set to 1 some of these zeros. We could thus evaluate, at 
least initially, the drift of the strain, by computing the 
average of the Hamming distance of the active strains 
from the initial point: 
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Here I is the incidence of the disease, i.e., the total num- 
ber of infected: 
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We call "active" strains those for which v a > 0. On 
the other hand, the cloud of active viral strains broadens 
as it drifts. Its width can be estimated by evaluating 



the average mutual Hamming distance between active 
strains, weighted by their incidence: 
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where the sum runs over all distinct pairs (<r, a') of active 
strains. 

It is interesting that in this regime, as already noticed 
in ref . [[ll| , the disease gets extinct at large values of the 
infectivity. With our data, e.g., for r = 2, /j, = 1, extinc- 
tion occurs for [3q < 2, but also for (3q > 8. This effect 
is due to the fact that the initial ripple in the number of 
infected individuals is followed by a severe bottleneck, as 
shown in fig. [j]. The bottleneck becomes more intense as 
the infectivity increases, eventually leading to extinction. 
The eventual proliferation of viral strains makes it diffi- 
cult to analyze this model by generalizing the occupation 
number representation framework used, e.g., in J12[ to a 
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Since this model cannot sustain the nonproliferating 
strain regime characteristic of influenza, we considered 
if infectivity randomness alone could be responsible for 
it. Indeed, recent studies Q have shown that single- 
nucleotide substitutions lead to a wide distribution of 
fitness effects in an RNA virus. It is likely that some of 
these effects also arise in the short nucleotide sequence 
coding for the immunologically relevant section of NA. 
We have thus attached to each strain a a value (i a of the 
infectivity, drawn from a gamma distribution of average 
/3o and parameter k. 

The behavior of the system in the presence only of the 
randomly distributed infectivity does not appear much 
different from that of the simple bitstring model. There 
are only the proliferation and the extinction regimes. If 
anything, the phase space allotted to the proliferating 
regime is reduced, because the high average values of the 
infectivity. The illness did not appear to remain, with 
our mutation rate and population size, for ranges r > 3. 
The average infectivity values in the population appear 
much larger than the average /3o of the distribution, sug- 
gesting that the competition takes place at the tail of the 
distribution. This behavior does not depend strongly on 
the parameter k, although the average values of the in- 
fectivity become smaller as k is increased. 

If the infectivity is nonrandom, but the short-term gen- 
eral immunity is present, the disease either dies off or, 
after a few ripples, reaches a steady state at an incidence 
level (number of infected) of the order of 



r = n- 
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In this regime, however, the effects of specific immu- 
nization are apparent only in the small reduction of the 
steady-state incidence level, as seen in fig. |^, left. On the 
other hand, one can see in fig. [5] that the effective number 
of active strains remains high, and most of all that they 
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FIG. 1: Simulation of the bitstring model with 500000 individuals. Parameters: duration of the illness r = 0.02, lifetime 
A = 50, infectivity j3 = fio = 3, immunity range r = 1. Left: Continuous line (left axis): number of infected as a function of 
time. Dashed line (right axis): effective number of strains vs. time. Right: Continuous line: Average Hamming distance of the 
active strains from the origin. Dashed line: Average Hamming distance between active strains. 
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FIG. 2: Behavior of the system with randomly distributed infectivity. Expected value of the infectivity /3q = 3, parameter 
k — 3, other parameters as in fig. ^. Left: Continuous line (left axis): Number of infected vs. time. Dashed line (right axis): 
Effective number of strains. Right: Continuous line (left axis): Average value of the infectivity. Dashed line (right axis): 
Average Hamming distance among active strains. Notice the proliferation and the divergence of the strains. 



diverge so that the average Hamming distance soon gets 
close to the theoretical value for a random sample. 

The picture changes if both random infectivity and 
short immunity are introduced. We show in fig. ^ the be- 
havior of the system with short immunity of duration 0.75 
years, and gamma-distributed infectivity with parameter 
k = 3 and an expected value /3rj = 3. Although the inci- 
dence of the disease settles down to a level close to the 
stationary level dictated by eq. (^), one can see that the 
effective number of strains remains of order unity. The 
competition among viral strains shows up quite clearly in 
the oscillating behavior of n(t) and of Sn(t). One can see 
that the actual average values of the infectivity remain 
quite high and that, in spite of the ongoing competition, 
the incidence of the disease does not show pronounced 
oscillations. 

We can now consider the slow version of the program, 
in order to analyze the behavior of the model in details. 



We show in fig. |5| the results of a simulation in which the 
time interval dt is taken equal to 10~ 3 years (correspond- 
ing to a few hours), and the other parameters are as in 
fig. [|, apart from the genome length L, which is set equal 
to 128 in order to reduce the probability of back muta- 
tions. While the overall incidence level remains close to 
that previously obtained, one can see that spontaneous 
oscillations in the number of infected persist, and are syn- 
chronized with corresponding oscillations in the effective 
number of strains and in the width of the distribution 
of strains in sequence space. In this systems, something 
analogous to the working regime of influenza appears to 
have been reached. One may also notice that the diver- 
gence between active strains, measured by the average 
Hamming distance, remains limited even if the average 
distance from the origin increases with time. The active 
strains show therefore ongoing change, but with a limited 
amount of divergence. 
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FIG. 3: Behavior of the system with short immunity but nonrandom infectivity. Infectivity /3 = /3o = 3, short immunity 
duration n = 0.75. Other parameters as in fig. |l|. Left: Continuous line (left axis): Number of infected vs. time. Horizontal 
line: expected number of infected according to eq. (^|). Dashed line (right axis): Effective number of strains. Although the 
number of infected remains close to the equilibrium value 7*, the effective number of strains reaches a large value, corresponding 
to proliferation. Right: Continuous line: Average Hamming distance from the origin of the active strains. Dashed line: Average 
Hamming distance between active strains. One sees that the distance between active strains keeps increasing, witnessing their 
divergence. (With L = 32, the average distance in a random sample would be dn = 16.) 



These results still hold if one introduces a small mod- 
ulation with a period of one year in the infectivity [ p0[ , 
by letting, for each viral strain a, 



fl,(t) =Ml + mosin(i/27r)). 



(7) 



However, one can see that the characteristic time of 
the incidence ripples in our model correspond to a few 
months, as if it were essentially determined by the dura- 
tion of the short immunity and of the illness, instead of 
by the viral strain turnover. Therefore a small modula- 
tion does little more than modulate the amplitude of the 
ripples, as shown in fig. pi 



V. DISCUSSION 

In this paper we addressed the problem of understand- 
ing the dynamical mechanisms underlying antigenic drift 
of the influenza A virus. In particular we revisited the 
problem outlined in || of the existence of a constantly 
evolving well-defined strain giving rise to comb- like shape 
phylogenetic trees, i.e., to a constantly evolving viral pop- 
ulation with comparatively narrow distribution in genetic 
space. We considered a minimal, individual based model 
that couples epidemic dynamics and viral evolution. Our 
main finding is that the absence of strain proliferation 
relates equally importantly to the large spectrum short- 
time cross- immunity emphasized in [0 and to heterogene- 
ity in the effective viral infectivity. In our model we di- 
rectly suppose that different strains have different values 
of the infectivity. Our results show that even a compar- 
atively narrow distribution of infectivity, in combination 
with the increased competition due to the presence of the 
short immunity, is sufficient to lead to a "drifting quasis- 
pecies" behavior. We have also seen that, on the other 



hand, such a behavior has a very narrow range of stabil- 
ity, if any, if either the short immunity or the random in- 
fectivity is lacking, at least in a model of a comparatively 
large population without spatial structure, like the one 
we have considered. We expect, on the basis of Kimura's 
theory of selection in finite populations that, as the 
population becomes larger and larger, the spread in viral 
infectivity needed to stabilize influenza behavior becomes 
narrower and narrower. 

Perhaps obscured by the emphasis on short time im- 
munity, a different mechanism was present in B . In that 
case the heterogeneity was provided by a non trivial "ge- 
ographical distribution" of individuals. These were as- 
sumed to be randomly distributed on a two dimensional 
space. As an effect, the competition between different vi- 
ral strains is enhanced, since some geographical locations 
acquire a leading role in spreading the disease, and the 
first strains to establish themselves in these locations ac- 
quire a standing advantage. However, the possibility that 
different strains are also characterized by different infec- 
tivities is quite natural and yields a realistic evolution- 
ary dynamics. It appears that some heterogeneity and a 
mechanism for an enhanced competition among strains 
is all that is needed to reproduce the observed evolu- 
tionary pattern. Most probably, varying infectivities for 
different strains and a heterogeneous pattern of contacts 
among individuals in different communities both play a 
role in influenza spreading and viral evolution. We think 
that this point deserves further research. Understanding 
which contact and social patterns favor an increased ef- 
fective infectivity could lead to more effective policies to 
keep under control influenza epidemics. 

Finally, phylogenetic trees similar to the ones of the 
influenza virus have been observed in in-host HIV evolu- 
tion 0] . The present work might be a stimulus to identify 
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the factors that play the role corresponding to short im- 
munity and heterogeneous infectivities in that case. 
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FIG. 4: Behavior of the system with randomly distributed infectivity and short immunity. Expected value of the infectivity 
Po = 3, parameter k — 3, range r — 1, short immunity duration r\ = 0.75. Other parameters as in fig. jjj. Left: Continuous 
line (left axis): Number of infected vs. time. Horizontal line: expected number of infected according to eq. (^). Dashed line 
(right axis): Effective number of strains. Right: Continuous line (left axis): Average value of the infectivity. Dashed line (right 
axis): Average Hamming distance among active strains. Dotted line: Average Hamming distance from the origin for the active 
strains. Notice that the number of active strains and their mutual distance remain limited while exploring sequence space. 
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FIG. 5: Behavior of the slow version of the model with randomly distributed infectivity and short immunity. Expected value 
of the infectivity /3o = 3, parameter k — 3, range r — 1, short immunity duration r) = 0.75, genome length L — 128. Other 
parameters as in fig. |l| Left: Continuous line (left axis): Number of infected vs. time. Dashed line (right axis): Effective number 
of strains. Right: Continuous line (left axis): Average value of the infectivity. Dashed line (right axis): Average Hamming 
distance among active strains. Dotted line: Average Hamming distance from the origin for the active strains. The incidence 
ripples are more evident in this version since the correlations in the immunity state of the population are more correctly taken 
into account. 



8 




IV 



2 4 6 



i j S 



16 18 20 



FIG. 6: Behavior of the slow version of the model with randomly distributed infectivity and short immunity, in the presence 
of a modulation of the infectivity according to eq. (Q). Modulation mo = 0.2. Other parameters as in fig. |H| Left: Continuous 
line (left axis): Number of infected vs. time. Dashed line (right axis): Effective number of strains. Right: Continuous line (left 
axis): Average value of the infectivity. Dashed line (right axis): Average Hamming distance among active strains. Dotted line: 
Average Hamming distance from the origin for the active strains. The incidence ripples are more evident in this version since 
the correlations in the immunity state of the population are more correctly taken into account. 



